Probes for variance detection

ABSTRACT

Disclosed are methods and reagents for detecting nucleotide mismatches (for example, due to sequence variances) in a nucleic acid sample involving the use of a nucleic acid probe derived from a hemizygous cell. Methods for determining the haplotype of a nucleic acid sample are also disclosed. Also disclosed are methods for producing the probe and kits containing the probe.

[0001] This application is a divisional of U.S. patent application Ser.No. 09/697,097, filed Oct. 26, 2000, which is still pending and which isa divisional of U.S. patent application Ser. No. 09/073,717, filed May6, 1998, now U.S. Pat. No. 6,183,958.

BACKGROUND OF THE INVENTION

[0002] This invention relates to methods and reagents for detectingmispaired nucleotides in duplex nucleic acids for use, for example, inidentifying genetic variations in nucleic acid sequences for research,therapeutic, and diagnostic applications.

[0003] Genetic variation occurs at approximately 1 out of every 100bases within the genome. Research aimed at discovering genetic variationassociated with diseases or disease therapies, as well as diagnostictests aimed at using genetic information to manage patient care,requires efficient methods for detecting and typing genetic variance invarious test sequences. Variances may be detected by a variety ofmethods. Many of these methods require the use of a probe with a uniquesequence (representing a single allelic form of the sequence) as areference by which to identify differences in the sequences ofhomologous DNA segments in patient test samples. Probes with a uniquesequence are commonly produced from cloned DNA or cDNA. However, the useof probes from cloned DNA limits the ability to identify variances toDNA segments for which such clones are readily available, oralternatively requires the cloning of each DNA segment to be analyzed.

SUMMARY OF THE INVENTION

[0004] The present invention involves a general method for obtaining andusing probes with unique sequences (monoallelic probes) from certaincells or tissues that are hemizygous for genes, chromosomal segments, orchromosomes that are the object of the analysis. Such probes are usefulfor the analysis of sequence variation, for example, by heteroduplexformation.

[0005] Accordingly, in a first aspect, the invention features a methodfor detecting a nucleotide mismatch in a nucleic acid sample thatincludes the steps of: (a) providing a nucleic acid probe derived from ahemizygous cell, the probe being complementary to a hemizygouschromosome or segment thereof present in the hemizygous cell; (b)forming a duplex between the nucleic acid sample and the probe; and (c)determining if the duplex contains a nucleotide mismatch.

[0006] In various embodiments of this aspect of the invention, thedetermining step is carried out using a denaturing gradient gelelectrophoresis technique; the nucleotide mismatch represents a sequencevariance in a population; the probe has a known sequence, and may bedetectably labeled; the hemizygous cell results from the loss of achromosome or segment thereof; the hemizygous cell includes multiplecopies of the hemizygous chromosome or segment thereof; the hemizygouscell may be human; the hemizygous cell may be an immortalized cell; thehemizygous cell may be derived from a complete hydatidiform mole, anovarian teratoma, an acute lymphocytic leukemia, an acute myeloidleukemia, a solid tumor, a squamous cell lung cancer, an endometrialovarian cancer, a malignant fibrous histiocytoma, or a renal oncocytoma;the hemizygous cell may be NALM-16 or KBM-7; and the hemizygous cell maybe derived from a haploid germ cell.

[0007] In yet other embodiments of the first aspect of the invention,the presence of the nucleotide mismatch correlates with a level oftherapeutic responsiveness to a drug or other therapeutic intervention;the presence of the nucleotide mismatch indicates a disease orcondition, or a predisposition to develop the disease or condition; thenucleic acid probe is produced by amplifying at least a portion of thehemizygous chromosome or segment thereof to produce the probe; thedetermining step utilizes a protein that binds or cleaves the nucleotidemismatch, for example, MutS or a resolvase (e.g., T4 endonuclease VII),and the determining step utilizes a chemical agent that detects thenucleotide mismatch. This method of the first aspect of the inventionmay be used to determine the haplotype of the nucleic acid sample.

[0008] In a second aspect, the invention features a method for detectinga nucleotide mismatch in a nucleic acid sample that includes the stepsof: (a) providing a nucleic acid probe derived from a sex chromosome;(b) forming a duplex between the nucleic acid sample and the probe; and(c) determining if the duplex contains a nucleotide mismatch.

[0009] In a third aspect, the invention features a method for detectinga nucleotide mismatch in a nucleic acid sample that includes the stepsof: (a) providing a nucleic acid probe derived from a somatic cellhybrid, the probe being complementary to a chromosome or segmentthereof, where only one allele of the chromosome or segment thereof ispresent in the somatic cell hybrid; (b) forming a duplex between thenucleic acid sample and the probe; and (c) determining if the duplexcontains a nucleotide mismatch.

[0010] In a fourth aspect, the invention features a kit for detecting anucleotide mismatch that includes: (a) a nucleic acid probe derived froma hemizygous cell, the probe being complementary to a hemizygouschromosome or segment thereof; and (b) a means for detecting anucleotide mismatch. In preferred embodiments, the probe is detectablylabeled; the detecting means is a protein that binds or cleaves thenucleotide mismatch, for example, MutS or a resolvase (e.g., T4endonuclease VII); and the detecting means is a chemical agent thatdetects the nucleotide mismatch.

[0011] In a fifth aspect, the invention features a method for producinga nucleic acid probe for the detection of a nucleotide mismatch thatincludes the steps of: (a) providing a hemizygous cell having at leastone hemizygous chromosome or segment thereof; and (b) amplifying atleast a portion of the hemizygous chromosome or segment thereof toproduce the probe.

[0012] In a sixth aspect, the invention features a method for producinga nucleic acid probe for the detection of a nucleotide mismatch thatincludes the steps of: (a) providing nucleic acid from a hemizygous cellhaving at least one hemizygous chromosome or segment thereof; and (b)using the nucleic acid to produce a probe, the probe being complementaryto at least a portion of the hemizygous chromosome or segment thereof.In one preferred embodiment, the nucleic acid is amplified, where theamplified nucleic acid is a representation of the genomic DNA of thehemizygous cell. In another embodiment of this aspect, the nucleic acidis an RNA or DNA library.

[0013] In preferred embodiments of the fifth and sixth aspects of theinvention, the probe has a known sequence; the method further includesdetectably labeling the probe; the hemizygous cell may be human; thehemizygous cell may be an immortalized cell; the hemizygous cell may bederived from a complete hydatidiform mole, an ovarian teratoma, an acutelymphocytic leukemia, an acute myeloid leukemia, a solid tumor, asquamous cell lung cancer, an endometrial ovarian cancer, a malignantfibrous histiocytoma, or a renal oncocytoma; the hemizygous cell isNALM-16 or KBM-7; and the hemizygous cell may be derived from a haploidgerm cell.

[0014] In a seventh aspect, the invention features a nucleic acid probefor the detection of a nucleotide mismatch, the probe being derived froma hemizygous cell and being complementary to a hemizygous chromosome orsegment thereof. In a preferred embodiment of this aspect of theinvention, the probe is detectably labeled.

[0015] In an eighth aspect, the invention features a nucleic acid probederived from an autosomal chromosome of a mammalian cell, the probehaving a unique sequence. In one preferred embodiment of this aspect ofthe invention, the probe is detectably labeled.

[0016] In a final aspect, the invention features a method fordetermining if two nucleotide mismatches are located on the same strandof DNA in a nucleic acid sample that includes the steps of: (a)providing a first nucleic acid probe derived from a hemizygous cell, thefirst nucleic acid probe having a first unique sequence; (b) forming afirst duplex between the nucleic acid sample and the first nucleic acidprobe; (c) contacting the first duplex with a compound that cleaves aduplex containing a nucleotide mismatch under conditions which allow thecompound to cleave the first duplex if the first duplex contains anucleotide mismatch; (d) providing a second nucleic acid probe derivedfrom a hemizygous cell, the second nucleic acid probe having a secondunique sequence; (e) forming a second duplex between the product of step(c) and the second nucleic acid probe; (f) contacting the second duplexwith the compound under conditions which allow the compound to cleavethe second duplex if the second duplex contains a nucleotide mismatch;and (g) comparing the product of step (c) with the product of step (f),a reduction in the size of the product of step (f) compared to theproduct of step (c) indicating that both the nucleotide mismatches arelocated on the same strand of DNA in the nucleic acid sample.

[0017] In preferred embodiments of the ninth aspect of the invention,the method is used to determine the haplotype of the nucleic acidsample; and three or more nucleic acid probes are provided, each derivedfrom a hemizygous cell and having a different unique sequence, and, foreach nucleic acid probe, steps (e)-(g) are repeated, and the products ofeach cleavage step compared.

[0018] In other embodiments of this aspect of the invention, thecompound may be a resolvase (e.g., T4 endonuclease VII) or may be achemical; the comparing step is carried out using a denaturing gradientgel electrophoresis technique; the first nucleic acid probe and thesecond nucleic acid probe are derived from the same hemizygous cell; thefirst and second nucleic acid probes may have a known sequence, and maybe detectably labeled; the hemizygous cell results from the loss of achromosome or segment thereof; the hemizygous cell includes multiplecopies of the hemizygous chromosome or segment thereof; the hemizygouscell may be human; the hemizygous cell may be an immortalized cell; thehemizygous cell may be derived from a complete hydatidiform mole, anovarian teratoma, an acute lymphocytic leukemia, an acute myeloidleukemia, a solid tumor, a squamous cell lung cancer, an endometrialovarian cancer, a malignant fibrous histiocytoma, or a renal oncocytoma;the hemizygous cell may be NALM-16 or KBM-7; and the hemizygous cell maybe derived from a haploid germ cell.

[0019] In yet other embodiments of the ninth aspect of the invention,the location of two nucleotide mismatches on the same strand of DNA in anucleic acid sample correlates with a level of therapeuticresponsiveness to a drug or other therapeutic intervention; the locationof two nucleotide mismatches on the same strand of DNA in a nucleic acidsample indicates a disease or condition, or a predisposition to developthe disease or condition; and the nucleic acid probes are produced byamplifying at least a portion of the hemizygous chromosome or segmentthereof to produce the probes.

[0020] By a “hemizygous cell” is meant a mammalian cell having one ormore autosomal chromosomes, or segments thereof, which are derived fromonly one parental copy and whose genome therefore contains one uniquesequence (i.e., is completely homozygous) at those chromosomallocations. Included within this definition is a cell having two (or evenmore) identical copies of this unique sequence chromosome (or segmentthereof), most commonly as the result of a chromosomal duplicationevent. Such unique sequence autosomal chromosomes are referred to hereinas “hemizygous chromosomes.”

[0021] By a “unique sequence” is meant the nucleotide sequence of thehemizygous chromosomes in a hemizygous cell, where substantially all ofthe homologous chromosomes in the cell contain the same base at everyposition within the sequence. By a probe having a “unique sequence” ismeant that substantially all copies of the probe made from a hemizygouscell contain the same base at every position within the sequence. In asolution of such a unique sequence probe, different bases comprise lessthan 1%, preferably, less than 0.1%, and, more preferably, less than0.01% of the bases present at any given position in that probe in thesolution. Typically, these low frequency base changes are introducedduring probe preparation (for example, during PCR amplification) and donot represent base differences present in the chromosomal sequence fromwhich the probe was generated. By “base” is meant a nucleotide,including an A (dATP), G (dGTP), C (dCTP), or T (dTTP) for DNA, and an A(ATP), G (GTP), C (CTP), or U (UTP) for RNA, as well as chemicalderivatives of these bases commonly known in the art that are substratesfor polymerases and which may be incorporated into amplified sequences.

[0022] By a “probe” is meant a nucleic acid molecule derived from agene, chromosomal segment, or chromosome that is used as a reference,for example, in variance detection to determine whether a test sample ofthe same gene, chromosomal segment, or chromosome derived from aparticular individual contains the identical sequence or a differentsequence at one or more nucleotide positions. Probes may be derived fromgenomic DNA or cDNA, for example, by amplification, or from cloned DNAsegments and, most commonly, contain either genomic DNA or cDNAsequences representing all or a portion of a single gene from a singleindividual. Preferably, the probe has a unique sequence (as definedabove) and/or has a known sequence.

[0023] By “autosomal chromosome” is meant any chromosome within a normalsomatic or germ cell except the sex chromosomes. In humans, for example,chromosomes 1-22 are autosomal chromosomes.

[0024] By “sex chromosome” is meant a chromosome, or a segment thereof,the presence, absence, or alteration of which affects the gender of theorganism from which the chromosome is derived. Human sex chromosomes,for example, are the X chromosome and the Y chromosome.

[0025] By a “haploid germ cell” is meant a sperm cell or an oocyte(i.e., an unfertilized egg cell).

[0026] By “haplotype” is meant an allele or a group of alleles (i.e., aspecific set of nucleotides at variant positions) on a single chromosomeor segment thereof.

[0027] By “polypeptide” or “protein” is meant any chain of amino acids,regardless of length or post-translational modification (for example,glycosylation or phosphorylation).

[0028] By “detectably labeled” is meant that a molecule is marked oridentified by some means that may be observed or assayed. Methods fordetectably labeling a molecule are well known in the art and include,without limitation, radioactive labeling (for example, with an isotopesuch as ³²P or ³⁵S), enzymatic labelling (for example, using horseradishperoxidase), chemiluminescent labeling, and fluorescent labeling (forexample, using fluorescein). Also included in this definition is amolecule that is detectably labeled by an indirect means, for example, amolecule that is bound with a first moiety (such as biotin) that is, inturn, bound to a second moiety that may be observed or assayed (such asfluorescein-labeled streptavidin).

[0029] By “resolvase” is meant any protein that is capable of cleaving amismatch (for example, a mismatch loop) in a heteroduplex, or is capableof cleaving a cruciform DNA. Examples of resolvases include, withoutlimitation, T4 endonuclease VII, Saccharomyces cerevisiae Endo X1, EndoX2, Endo X3, and CCE1 (Jensch et al., EMBO J. 8: 4325-4334, 1989; Kupferand Kemper, Eur. J. Biochem. 238: 77-87, 1996), T7 endonuclease I, E.coli MutY (Wu et al., Proc. Natl. Acad. Sci. USA 89: 8779-8783, 1992),mammalian thymine glycosylase (Wiebauer et al., Proc. Natl. Acad. Sci.USA 87: 5842-5845, 1990), topoisomerase I from human thymus (Yeh et al.,J. Biol. Chem. 266: 6480-6484, 1991; Yeh et al., J. Biol. Chem. 269:15498-15504, 1994), and deoxyinosine 3′ endonuclease (Yao and Kow, J.Biol. Chem. 269: 31390-31396, 1994). To carry out mismatch detection,one or several resolvases may be utilized. A resolvase represents onetype of protein that may be used to detect a mismatch.

[0030] By “bindase” is meant any protein that is capable of specificallybinding to, but not cleaving, a heteroduplex. A bindase represents onetype of protein that may be used to detect a mismatch, and may be usedalone, with another bindase, or with one more resolvases to carry outmismatch detection. One exemplary bindase is E. coli MutS.

[0031] By a “chemical agent that detects a heteroduplex” is meant achemical agent that modifies mismatched nucleotides. Examples of suchchemical agents are carbodiimide, hydroxylamine, osmium tetroxide, andpotassium permanganate which are used in the carbodiimide (CDI) and theChemical Cleavage of Mismatch (CCM) methods (Smooker and Cotton, Mutat.Res. 288: 65-77, 1993; Roberts et al., Nucl. Acids Res. 25:3377-3378,1997). In a given mismatch detection assay, one or several chemicalagents or methods may be utilized.

[0032] By “duplex” is meant a structure formed between two annealedcomplementary nucleic acid strands (for example, one nucleic acid strandfrom a test sample and one nucleic acid strand from a probe) in whichsufficient sequence complementarity exists between the strands tomaintain a stable hybridization complex. A duplex may be either ahomoduplex, in which all of the nucleotides in the first strandappropriately base pair with all of the nucleotides in the secondopposing complementary strand, or a heteroduplex. By a “heteroduplex” ismeant a structure formed between two annealed strands of nucleic acid inwhich one or more nucleotides in the first strand do not or cannotappropriately base pair with one or more nucleotides in the secondopposing (i.e., complementary) strand because of one or more mismatches.Examples of different types of heteroduplexes include those whichexhibit an exchange of one or several nucleotides, and insertion ordeletion mutations, each of which is described in Bhattacharyya andLilley (Nucl. Acids Res. 17: 6821-6840, 1989).

[0033] By “complementary” is meant that two nucleic acids, e.g., DNA orRNA, contain a sufficient number of nucleotides which are capable offorming Watson-Crick base pairs to produce a region ofdouble-strandedness between the two nucleic acids. Thus, adenine in onestrand of DNA or RNA pairs with thymine in an opposing complementary DNAstrand or with uracil in an opposing complementary RNA strand. It willbe understood that each nucleotide in a nucleic acid molecule need notform a matched Watson-Crick base pair with a nucleotide in an opposingcomplementary strand to form a duplex.

[0034] By “mismatch” is meant that a nucleotide in one strand does notor cannot pair through Watson-Crick base pairing and π-stackinginteractions with a nucleotide in an opposing complementary strand. Forexample, adenine in one strand would form a mismatch with adenine,cytosine, or guanine in an opposing nucleotide strand. In addition, amismatch occurs when a first nucleotide cannot pair with a secondnucleotide in an opposing strand because the second nucleotide is absent(i.e., an unmatched nucleotide).

[0035] By a “disease” is meant a condition of a living organism whichimpairs normal functioning of the organism, or an organ or tissuethereof.

[0036] By an “immortalized cell” is meant a cell that is capable ofundergoing a substantially unlimited number of cell divisions in vivo orin vitro. One example of an immortalized cell is a cell into which (orinto an ancestor of which) has been introduced an exogenous gene or geneproduct (e.g., an oncogene) or virus (e.g., Epstein-Barr virus) whichallows that cell to divide an unrestricted number of times. Animmortalized cell may also arise from a genomic mutation in anendogenous gene that gives rise to a mutated gene product ordysregulation of an endogenous gene product (e.g., a dysregulation thatallows the overexpression of a cell cycle regulatory gene). Animmortalized cell is distinguished from a stem cell in that animmortalized cell has an alteration affecting normal gene expressionand/or regulation. Exemplary immortalized cells include cancer celllines, such as those that have been generated from solid and non-solidtumors. Such cells are commercially available, for example, from theAmerican Type Culture Collection (see ATCC Catalog of Cell Lines andHybridomas, Rockville, Md.). Immortalized cells also includenaturally-occurring or artificially-generated cell lines.

[0037] Other features and advantages of the invention will be apparentfrom the following detailed description, and from the claims.

BRIEF DESCRIPTION OF THE DRAWING

[0038]FIG. 1 is a schematic representation of the four possible alleles(1-4) for variance #1 and variance #2 located 300 nucleotides apart on achromosomal fragment.

DETAILED DESCRIPTION

[0039] A variety of methods are described herein for the rapid andaccurate detection of DNA sequence variation for research, therapeutic,or diagnostic purposes. Table I lists exemplary techniques to detectand/or resolve sequence differences, grouped according to thecharacteristics of the physical or enzymatic method used. All of thesemethods involve the formation of heteroduplexes, either between a probeand a test sample or between alleles of a test sample. TABLE I Methodsfor DNA variance detection based on heteroduplex formation SensitivitySelectivity Method Principle References (false negatives) (falsepositives) Probe Issues Heteroduplex Heteroduplexes have altered Whiteet al., Genomics 12: Not extensively False positive rate Assay can berun with Analysis (HA); mobility (usually they run 301-306, 1992;Ganguly et characterized, but in appears low from or without probe.Probe also called slower) in nondenaturing al., Proc. Natl. Acad. Sci.general, probably 80- published data. allows one step Conformationacrylamide gels relative to USA 90:10325-10329, 90% sensitive. (Forlabelling, standard- Sensitive Gel homoduplexes, apparently due 1993;Williams et al., Hum. example, 8 of 9 point ization of assayElectrophoresis to DNA kinking at sites of Mol. Gen. 4: 309-312,mutations were conditions, simplified mismatch. Alternative gel 1995.detected by White et interpretation of results, matrices can increaseresolution. al.) and serves as a control. Chemical Mispaired nucleotidesin Hydroxylamine & osmium Sensitivity of Background can be Assay can berun with cleavage of heteroduplex DNA are acces- tetroxide: Cotton etal., hydroxyl- high; method is or without probe. Probe mismatches sibleto base modifying rea- Proc. Natl. Acad. Sci. USA amine/osmium highlysensitive to allows one step (CCM) gents such as hydroxylamine, 85:4397-4401, 1988; tetroxide method is cleavage labelling, standard-osmium tetroxide, potassium Cotton et al., Nucl. Acids excellent,although conditions. ization of assay permanganate, and Res. 17:4223-4233, 1989; efficiency of cleavage conditions, simplifiedcarbodiimide. The modified Smooker et al., Mut. Res. is not 100% orinterpretation of results, bases can subsequently be 288: 65-77, 1993.homogeneous among and serves as a control. specifically cleaved withCarbodiimide: Novack et different sites. The piperidine or otherchemicals al., Proc. Natl. Acad. Sci. carbodiimide method USA 83:586-590, 1986 is not well tested Ribonuclease Ribonuclease A cleavesRNA: Myers et al., Science 230: Theoretically high but Higher falseAssay is generally cleavage of DNA heteroduplexes speci- 1242-1246,1987; Goldrick as a practical matter positive rate than performed with aprobe. RNA:DNA or fically at mismatched bases on et al., Biotechniques21: the method is subject most other methods Probe allows one stepRNA:RNA the RNA strand. Conditions for 106-112, 1996. to many artifactsand labelling, standard- heteroduplexes using other RNAases have been isnot widely used. ization of assay discovered (MisMatch Detect II RNAsedoes not cut conditions, simplified kit, Ambion, Inc., Austin, TX) withequal efficiency interpretation of results, which reportedly improve atall mis-matches; and serves as a control. sensitivity to nearly 100%.background can be high. Denaturing Double stranded DNA or RNA Abrams andStanton, Sensitivity is nearly False positives are Probes are notGradient Gel fragments are resolved on the Methods in Enzymology 100%with minimal required (sample - Electrophoresis basis of conformationaldiffer- 212: 71-104, 1992; Abrams appropriate design of sample hetero-(DGGE) and ences associated with partial et al. Genomics 7: 463- GCclamps based on duplexes will suffice) related methods strand melting asthey migrate 475, 1990; Myers et al., pp. analysis of melting butprovide (e.g., denat- through a gradient of 71-88 in Erlich, H.A. (ed.),maps. Sensitivity is advantages of speed, uring HPLC denaturant in agel. The denat- PCR Technology: much lower for consistency, and ease(DHPLC) and urant can be chemical, e.g., Principles and Applicationsnatural (non-clamped) of automation. temperature DGGE, where a gradientof for DNA Amplification, sequences. gradient gel formamide and urea isused, or Stockton Press, New York, electrophoresis thermal, e.g.,thermal gradient 1989. (TGGE)) gel electrophoresis (TGGE). Resolvases,Resolvases are enzymes that Youil et al., Genomics 32: Sensitivity isvirtually False positives are Assay can be run with including T4recognize and cleave branched 431-435, 1996; Youil et 100%, a claimminimal with or without probe. endonuclease DNA intermediates. They alsoal., Proc. Natl. Acad. Sci. supported by 86 of 86 optimal conditions.Probe allows one step VII and T7 recognize and cleave DNA USA 92: 87-91,1995; globin mutations Impure probes labelling, standard- endonucleaseI, heteroduplexes containing Mashal et al., Nature detected by R. Cottonresult in ization of assay which single base mismatches, Genetics 9:177-183, 1995 (Genomics 32:431, background conditions, simplifiedrecognize and deletions, or insertions. Resol- 1996). However,cleavages. interpretation of cleave vases have been identified in acleavage efficiency is results, and serves as a mismatches wide varietyof species. highly variable. control. Mismatch Bindases are proteinsthat Debbie, P., et al, Nucleic The sensitivity is The selectivity isAssay can be run specific selectively bind hetero- Acids Research 25:4825- reportedly high, reportedly excellent, with or without bindases,such duplexes. Bindases can be 4829, 1997; Wagner, R., et although nosystematic however, signal to probe. Probe allows as MutS, bindimmobilized on a solid phase al. Nucleic Acids Research surveys of largewell noise ratio is one step labelling, to hetero- and label onheteroduplexes 19: 3944-3948, 1995. characterized mutant generally lowstandardization of duplexes can be detected as bound collections havebeen (approx. 2-fold) assay conditions, material after washing off non-done. Deletions of suggesting simplified bound homoduplexes. MutS isgreater than 4 bases problems with interpretation of the best studiedexample, but will be missed. selectivity. results, and serves as thereare related a control.

[0040] We have determined that it would be preferable in many instancesto derive probes without cloning through the amplification of genesegments from a template of genomic DNA or cDNA. Since most genes in anindividual are present in two copies, one inherited from each parent,that normally differ in their sequences at one or more positions, andthe positions of all such variations are not known, it is not generallypossible to derive a probe which has a unique base present at eachposition within the sequence. This results in a probe that can form aheteroduplex with itself, creating a variety of possible backgroundproblems for variance detection. Described herein are novel methods forproducing, without cloning, probes that have unique base sequences. Suchprobes are useful for identifying variant sequences in genes derivedfrom an individual.

[0041] It is known that the sequence of a specific gene differssubstantially in different individuals. Estimates suggest that 1 in 100bases to 1 in 300 bases within the sequence of a gene vary betweenindividuals in human populations. Many variations are pathogenicmutations that cause disease, but the vast majority are non-pathogenicchanges that contribute to normal human variability or the variableresponses of individuals to their environment. Others may have nobiological consequence. Methods for detecting variances in genesequences have important applications both for research into the causeof disease and for medical diagnostics. Methods for detecting variancesalso have broad application in the fields of animal breeding and plantgenetics in that variances may be discovered to be associated withphenotypes of interest.

[0042] For use in the methods described herein, a test sample (that is,a segment of genomic DNA or cDNA derived from an individual by cloningor amplification that contains a sequence of interest) is analyzed. Theterm “derived from an individual” refers to cells, such as blood,tissue, or cultured cells, that are obtained originally from anindividual. The test sample may be derived from an individual for use inresearch or diagnostic testing aimed at discovering variances that existin the population, variances that potentially represent the cause ofdisease, the cause of normal variation, or silent variation. In oneexample, the test sample may be derived from a patient with a disease, acondition, or another phenotype of interest, to determine whether thatpatient's gene contains variances that are associated with a particulardisease, predisposition to disease, prognosis, or response to therapy.Alternatively, the test sample may be derived from a patient with orwithout disease symptoms to diagnose whether that patient has a disease,or a predisposition to a disease, or to assess the patient's present orfurther response to therapy. For research aimed at the discovery ofvariances and for diagnostic methods, it is preferable to have a probewith a unique sequence in order to identify variations from thisreference sequence.

[0043] Variances may be detected in test samples by forming duplexesbetween the two parental copies present in a test sample. If the twosequences differ, heteroduplexes will be formed. This approach is lesspreferred for the following reasons. First, each sample must be labelledseparately, increasing the likelihood of non-uniform or inadequatelabelling and requiring new labelling for each new batch of testsamples. Second, any sequence differences are measured against anunknown reference sequence; homozygosity for variances of interest willnot be detected. And, third, there is no known hemizygous referencesample against which to compare the test samples.

[0044] A preferred method for detecting variances involves hybridizing aprobe to a test sample and analyzing the resulting double strandedmolecule for heteroduplexes which arise when the sequences of the probeand test sample differ at one or more positions. As described above, aheteroduplex is any double stranded DNA molecule in which the twostrands differ at one or more positions. Strands form a homoduplex ifthe sequences of the two strands allows normal base pairing to occur atevery position within the double stranded DNA molecule (i.e., G bindingto C, and A binding to T). Strands form a heteroduplex when normal basepairing does not occur at one or more positions. This results when thesequence of the test sample differs from the sequence of the probe, forexample, due to a transition, transversion, insertion, or deletion ofnucleotides within the gene sequence. The absence of heteroduplexesreveals that the sequence of the test sample is identical to that of theprobe. The presence of heteroduplexes reveals that the sequence of thetest sample contains variances at one or more positions, and may alsoreveal the location and nature of the variance.

[0045] If the probe does not have a unique sequence, this analysis maybe complicated by the formation of heteroduplexes between differentprobe molecules, rather than between the probe and the test sequence,thus compromising the analysis of the test samples. A probe which doesnot have a unique sequence and which leads to formation ofheteroduplexes between different constituents of the probe populationgenerates a background signal of noninformative heteroduplexes that canmask informative heteroduplexes formed between probe and test samplestrands, resulting in a failure to detect true sequence variants. Theunique sequence probes described herein are used to form duplexes withtest samples to determine whether the sequence of the test sample isidentical to the probe or whether it contains variances.

[0046] Heteroduplexes can be distinguished from homoduplexes by avariety of methods known in the art including, without limitation,methods based on the physical structures that are formed by mismatchedbase pairing, the altered thermal stability of heteroduplexes (DNAmelting behavior) as opposed to that of homoduplexes, the recognition ofmismatched bases by mismatch recognition enzymes (such as elements ofthe DNA repair system or resolvases), or chemical reactions withmispaired bases. These methods can utilize, for example, (i) the alteredelectrophoretic mobility of heteroduplexes and homoduplexes, either in anondenaturing gel, denaturing gel, gradient denaturing gel, or by liquidchromatography, (ii) the susceptibility of heteroduplexes to binding byenzymes (e.g., MutS) that recognize heteroduplexes, or (iii) thesusceptibility of heteroduplexes to cleavage by enzymes (e.g., T4endonuclease VII) that recognize heteroduplexes. Techniques that detectaltered electrophoretic mobility include heteroduplex analysis (HA),constant denaturant gel electrophoresis (CDGE), denaturing gradient gelelectrophoresis (DGGE), and denaturing high pressure liquidchromatography (DHPLC). Exemplary techniques that may be used to detectbinding of mismatches in heteroduplexes utilize enzymes such as E. coliMutS (or equivalent proteins from various species) and related DNArepair enzymes. Techniques that assay cleavage of heteroduplexes utilizeenzymes such as MutS in combination with other components of thebacterial DNA nucleotide repair system such as MutH and MutL (that is, aMutHLS complex, or equivalent protein complexes from other species) orenzymes such as phage T4 endonuclease VII or phage T7 endonuclease I.

[0047] In a preferred approach for carrying out variance detection, theunique sequence probe of the invention is detectably labeled, whichallows the heteroduplex, or a fragment of the heteroduplex, to bevisualized by analytical equipment or imaging. Commonly, nucleotidederivatives are used in the production of the probe which facilitate theincorporation of a moiety into the probes that can be directlyvisualized or that participates in a reaction that creates a detectableevent. Examples of detectable labels include, but are not limited to,radioactive atoms, fluorescent or chemiluminescent molecules, enzymes,or affinity ligands. Preferably, the label on the probe is used toidentify heteroduplexes or homoduplexes formed by hybridizing the probeto the test sample. In an alternative approach, the detectable label mayinstead be incorporated into the test sample, and the detectably labeledtest samples used to identify heteroduplexes or homoduplexes formedfollowing probe hybridization.

[0048] It will be understood that the terms “probe” and “test sample”typically refer to solutions or suspensions of probes or test samples ofsufficient amount and sufficient concentration to form and detect doublestranded heteroduplexes and homoduplexes. As known in the field ofgenetic variance detection, such analysis may require amounts ofmaterial ranging from 1 pg to 1 μg per reaction, representing as many as10⁶ to 10¹² molecules per reaction, where each reaction involves themixture of a probe and a test sample to form heteroduplexes orhomoduplexes. It is preferable to produce probes in sufficient quantityto allow one batch of probe to be used to assay a large number of testsamples, for example, more than 100, or preferably more than 1,000 oreven 10,000 test samples. This simplifies standardization of theprocedures and improves efficiency for research or diagnosticapplications.

[0049] The large amount of material required for generating probes ismost commonly produced from a cloned segment of genomic DNA or cDNAcorresponding to the gene to be analyzed. This may involve, for example,making a preparation of plasmid or M13 and preparing the probe byrestriction digestion, polymerization, or PCR amplification using theclone as a template. This requires having a cloned version of a gene tobe analyzed. When the goal is to analyze many different DNA sequences,the isolation of cloned versions of all the sequences may beinefficient. This is particularly true if the sequences to be analyzedare cDNAs, or long segments of genomic DNA. In the former case it may betechnically difficult to obtain a full length cDNA clone; in the lattercase, many different contiguous segments would need to be successfullycloned.

[0050] We have determined that it would be advantageous to have probesthat may be reliably produced from a genomic DNA or cDNA template.However, materials produced by PCR amplification of the genomic DNA orcDNA template from normal cells or tissues of any particular individualcontain sequences derived from two different gene copies (e.g.,maternally and paternally derived) that commonly differ at one or morepositions and are therefore not optimal for use as probes. As describedabove, the presence of heteroduplexes formed within such probe materialinterferes with the detection of heteroduplexes formed with the testsample. It should be emphasized that any material derived from anindividual who has two copies of a gene may be presumed not to representa unique sequence given the high frequency of genetic variances in thepopulation. While an individual may be said to be homozygous by havingthe same sequence at certain positions, in virtually all cases, the genewill not be identical at all positions, and it may be presumed that aprobe developed from a template of genomic DNA or cDNA from such anindividual would not be unique at all or substantially all nucleotidebases.

[0051] We have determined that probes may be reliably produced fromgenomic DNA or cDNA segments from cells that contain only one parentalcopy of the gene to be analyzed, for example, by PCR amplification.Furthermore, we have identified classes of tissues and cells, as well asspecific cells derived from tumors, tumor cell lines, or hybrid celllines, that contain only one allele of the gene to be analyzed and inwhich the template of genomic DNA (or a cDNA thereof) comprises a singleparental copy of the gene sequence and provides a probe with a uniquebase at each position.

[0052] In one example, the template may be derived from one copy of agene, chromosomal segment, or chromosome remaining in a cell (eithermaternal or paternal) after loss of the heterologous gene, chromosomalsegment, or chromosome. Similarly, the template may be derived from morethan one copy of a gene, chromosomal segment, or chromosome present in acell, each gene, chromosomal segment, or chromosome being derived from asingle parental copy (either maternal or paternal).

[0053] It should be noted that it is not necessary that the tissue, cellline, or tumor have a single copy of the chromosome or chromosomes to beused for probe generation (i.e., it need not be haploid). Often tumorsor cell lines, when reduced to one copy of a chromosome, duplicate thatone copy so that, while no longer haploid, they remain hemizygous at alllocations on the duplicated chromosome or chromosome fragment, meaningthat only one version of the chromosome is present. Duplication of asingle chromosome in such instances to create, for example, two copiesof a chromosome is distinct from having two different chromosomes thatmay be homozygous for certain sequences. In contrast, duplicatedchromosomes in hemizygous cells may be presumed to have uniquesequences.

[0054] We use the term “gene” to refer to a sequence of DNA thatincludes both coding and noncoding regions. The term “allele” refers toone specific form of a gene within a cell or within a population, thespecific form differing from other forms of the same gene at one, andfrequently more than one, variant site within the gene sequence. Thesequences at the variant sites that differ between different alleles aretermed “variances,” “polymorphisms,” or “mutations.” In general, normalcells contain two alleles of each gene, with one allele inherited fromeach parent.

[0055] The term “template” refers to a source of material that is thesubstrate for preparation of a probe using an amplification reaction.For example, a probe having a unique sequence can be made by amplifying,as a template, a hemizygous chromosome. Methods for amplificationinclude, without limitation, polymerase chain reaction, ligase chainreaction, NASBA, SDA, 3SR, and TSA. It will be understood that, in anygiven method for generating a probe having a unique sequence, one orseveral amplification methods may be employed.

[0056] A second method for generating a probe having a unique sequenceis to prepare nucleic acid from a hemizygous cell, and to use thisnucleic acid to produce the probe. For example, cDNA may be generatedfrom the RNA of a hemizygous cell and used to create a cDNA libraryaccording to standard molecular biology techniques (see, e.g., Ausubelet al., Current Protocols in Molecular Biology, John Wiley & Sons, NewYork, 1997; Sambrook et al., Molecular Cloning, A Laboratory Manual, 2nded., Cold Spring Harbor Press, Cold Spring Harbor, N.Y., 1989). Theprobe may then be generated from the library by a variety of methodsincluding, without limitation, restriction endonuclease digestion,polymerase chain reaction, or the amplification methods cited herein.This method is particularly useful if the probe is from a hemizygouscell that is not an immortalized cell.

[0057] In other variations for producing a probe having a uniquesequence, the template may be genomic DNA or cDNA derived from a cell ortissue in which the gene of interest is present in a single parentalcopy. Another template may be genomic DNA or cDNA derived from a cell ortissue in which a chromosome fragment containing the gene of interest ispresent in a single parental copy. The template may also be genomic DNAor cDNA derived from a cell or tissue in which one or more chromosomescontaining genes of interest are present in single parental copies.Preferably, at least one chromosome, more preferably at least 5chromosomes, yet more preferably, at least 15 or 20 chromosomes, and,most preferably, all autosomal chromosomes are present in a singleparental copy in a cell used for probe production. Probes for genespresent on the X chromosome or Y chromosome may be derived fromtemplates of genomic DNA and cDNA derived from a cell line or tissuethat has only one parental copy of the X chromosome and one parentalcopy of the Y chromosome (e.g., male cell lines or tissues). Ideally thecell line or tumor has a complete set of chromosomes derived from oneparent, since in that case one source can be used to produce a probe forany DNA or cDNA segment, regardless of its chromosomal location. This isparticularly useful in cases where the chromosome location of the probesequence is not known. Alternatively, two or more cell lines may,between them, be hemizygous for all of the chromosomes of the species ofinterest.

[0058] Another cell or tissue from which the template is derived maycontain genes from species other than humans in addition to the humangene that is the template for producing a probe (such as a somatic cellhybrid). For example, the cell may contain a complete set of genes froma species other than humans and one or more human chromosome segments orchromosomes. Preferably, the cell contains more than one humanchromosome, more preferably, at least 5 human chromosomes, yet morepreferably, at least 15 or 20 human chromosomes, and, most preferably,all human autosomal chromosomes present in a single parental copy. Thecell may also contain one copy of the human X chromosome or one copy ofthe human Y chromosome. In examples of preferred somatic cell hybrids,the non-human genes are of mouse or Chinese hamster origin.

[0059] It will be apparent that the methods described herein are alsoapplicable to analysis of genes from other species using, as sources fortemplates, cells or tissues that have only one parental copy of one ormore genes from that species. For example, these techniques may beemployed to create a bovine probe with a unique sequence, which may thenbe used to identify variant sequences in bovine genes. Such variantbovine sequences may be used for research, therapeutic, or diagnosticmethods related to diseases in cattle (e.g., brucellosis). Similarapproaches may also be used for veterinary techniques involving otherlivestock or domestic pets. Examples of hemizygous cell lines from otherspecies are known in the art (Li et al., Mol. Mar. Biol. Biotechnol.3(4):217-227, 1994; Bostock, Exp. Cell. Res. 106(2):373-377, 1977).

[0060] As applied to humans, a number of hemizygous cells and tissuesmay be used to provide suitable templates for genomic DNA or cDNA forproducing probes having unique sequence. There now follow exemplarysources for hemizygous probe production and exemplary mismatch detectiontechniques. These examples are provided for the purpose of illustratingthe invention and should not be construed as limiting.

[0061] Gestational Trophoblastic Tumors which are Derived from a SingleGerm Cell

[0062] Complete hydatidiform moles (CHM) are commonly derived from asingle germ cell. While the cells of a CHM commonly contain two copiesof each chromosome, both copies are derived from the same parent—nearlyalways the father. Most CHMs (approximately 70%) are derived from asingle sperm which presumably undergoes duplication after fertilizationof an empty, or anuclear egg. Choriocarcinomas are malignant gestationaltrophoblastic tumors believed to arise, in at least some cases, fromCHMs. Most choriocarcinomas are genetically heterozygous; however, thereare reports of completely homozygous choriocarcinomas (Fisher et al.,Br. J. Cancer 58: 788-792, 1988). DNA or RNA prepared from CHMs orchoriocarcinomas may be used to prepare monoallelic DNA or cDNA probes(i.e., probes derived from a single allele and therefore having uniquesequence) by the methods described above. Alternatively, cell linesderived from such tumors may be used to provide a more convenient,easily accessible, and inexhaustible supply of DNA or RNA. A number ofcell lines have been established from choriocarcinomas and aredescribed, for example, in Nakamura and Yamashita (Nippon Sanka FujinkaGakkai Zasshi 38: 477-484, 1986). Some choriocarcinoma cell lines havebeen characterized cytogenetically or using RFLP or other DNA markersand have been found to be heterozygous; other choriocarcinoma cell lineshave not yet been tested. Several cell lines have been developed fromCHMs as well, but there are no reports of cytogenetic or molecularcharacterization of these cells.

[0063] Ovarian Teratomas that are Derived from a Single Germ Cell

[0064] Benign ovarian teratomas are entirely maternal in theirderivation, and most commonly have a normal karyotype of 46,XX. It isestimated that as many as 65% of teratomas (Surti et al., Amer. J. Hum.Gen. 47: 635-643, 1990) are derived from a single maternal germ cellafter failure of meiosis II (type II teratomas) or endoreduplication ofa mature ovum (type III teratomas). Such teratomas are hemizygous forall loci in the genome, containing only one copy of the chromosomesnormally present in the mother. Either tissue or cell lines establishedfrom such teratomas may be used as templates for DNA or RNA forpreparation of monoallelic DNA or cDNA probes.

[0065] Leukemias or Leukemia Cell Lines with Near Haploid Karyotypes, orthat Have Passed Through a Stage of Near Haploidy

[0066] Up to 10% of acute myeloid leukemias in blast phase, as well assome acute lymphocytic leukemias, undergo a characteristictransformation to near haploidy (see, for example, Gibbons et al.,Leukemia 5(9):738-743 1991). In some cases, there are multiple copies ofthe haploid chromosome set, meaning that, while only one parental copyis preserved in the cell, this chromosome may be present in severalcopies. Either purified tumor cells or cell lines established from suchtumors may be used as a source of DNA or RNA for preparation ofmonoallelic probes (i.e., probes with unique sequences) for loci on allchromosomes except any retained diploid chromosomes. The problem ofdiploidy for some chromosomes may be overcome by using several celllines with non-overlapping diploid chromosomes to make probes. Severalimmortal cell lines with the desired properties have been described. Forexample, Kohno et al (J. Natl. Cancer Inst. 64: 485-493, 1980) havedescribed a cell line called NALM-16, which was established from anacute lymphoblastic leukemia in which only three chromosomes,chromosomes 14, 18, and 21, were disomic at one point in the evolutionof the leukemia. In addition, Andersson et al. (Leukemia 9(12):2100-2108, 1995) have described a cell line, KBM-7, which is diploid foronly two chromosomes, chromosomes 8 and 15. Together, these two celllines are useful for preparing monoallelic probes from any position inthe genome.

[0067] Solid Cancer Cell Lines with Near Haploid Karyotypes, or thatHave Passed Through a Stage of Near Haploidy

[0068] Near-haploid karyotypes have been reported in a variety ofnon-leukemic malignancies, including squamous cell lung cancer (having27 chromosomes), endometrioid cancer of the ovary (having 29chromosomes), malignant fibrous histiocytoma, and renal oncocytoma. Acell line has been established from the lung cancer specimen (describedin Drouin et al., Genes, Chromosomes, and Cancer 7: 209-212, 1993), andis diploid for only chromosomes 5, 7, 22, and the X chromosome. Eitherpurified tumor cells or cell lines established from such tumors areuseful as sources of DNA or RNA for preparation of monoallelic DNA orcDNA probes (i.e., probes having unique sequences) for loci on allchromosomes except any retained diploid chromosomes.

[0069] Individual Germ Cells

[0070] Another example of a cell haploid for all human chromosomes is anindividual germ cell (i.e., a sperm cell or an oocyte). PCR productsfrom, for example, a single sperm cell may be generated and labeled toproduce probes having unique sequences (see, for example, Li et al.,Proc. Natl. Acad. Sci. 87: 4580-4584, 1990). Alternatively, a wholegenome amplification procedure using, for example, primer extensionpreamplification (PEP; for example, as described in Zhang et al., Proc.Natl. Acad. Sci. 89: 5847-5851, 1992; and Casas and Kirkpatrick,Biotechniques 20: 219-225, 1996) may be employed to generate anamplified representation of the sequences in the single sperm cell (orany other single haploid cell, such as an oocyte). From the resultingamplified representation, multiple subsequent amplifications may beperformed to generate a variety of different unique sequence probes.This approach greatly reduces the effort involved in generating largenumbers of unique sequence probes. Similar methods may be applied tooocytes (Cui et al., Genomics 13: 713-717, 1992) and to single somaticcells (Snabes et al., Proc. Natl. Acad. Sci. 91: 6181-6185, 1994), forexample, those somatic cells described below.

[0071] Somatic Cell Hybrids that are Monosomic for One or More HumanChromosomes

[0072] A different approach to the isolation of cell lines that arehaploid for human chromosomes is the use of somatic cell hybrids. Theseare produced by fusing human cells or micronucleated chromosomes tononhuman recipient cells, usually hamster or mouse cell lines. This maybe achieved by transferring a single human chromosome to the non-humanrecipient cell, or by selecting hybrid cell lines in which only a singlehuman chromosome is retained. A collection of such cell lines, eachretaining a different human chromosome, has been assembled at theCoriell Cell Repository (Coriell Institute for Medical Research, Camden,N.J.; see pages 703-743 of their 1994/1995 Catalog of Cell Lines for adescription of these cell lines). The human genes are typicallyexpressed in such hybrids, enabling the isolation of either monoallelicDNA or cDNA using, for example, standard amplification procedures.

[0073] Preparation of Probes by Amplification of Single Copy Genes

[0074] Once a cell is determined to have only one allele of a gene (orfragment thereof) or chromosome (e.g., a hemizygous cell or a somaticcell hybrid), a probe may be prepared from this monoallelic gene orchromosome by any number of standard techniques known in the art (see,e.g., Ausubel et al., supra; Sambrook et al., supra). Particularlyuseful probes of the invention are labelled. In one exemplary method, togenerate a rhodamine-labelled probe, RNA is collected from the cell,cDNA is prepared from the RNA (using, e.g., the Universal RiboClone cDNAsynthesis kit from Promega, Madison, Wis.), and the following PCRprotocol is employed.

[0075] According to this technique, fluorescently labeled probes withhigh specific activity are produced by incorporation of rhodaminelabeled dUTP into a DNA fragment during PCR. The concentration ofrhodamine labeled dUTP can be varied to achieve different degrees offluorescent labeling. Methods for optimizing PCR reactions are wellknown in the art of molecular biology and include testing varioustemperatures for the annealing step of the PCR, changing the MgCl₂concentration, and changing primer and template concentrations. Oneexample of a PCR protocol for use in a variance detection systemutilizing T4 endonuclease 7 is as follows.

[0076] A 500 μl PCR reaction is set up which contains Taq buffer (1×from a 10×stock; 10×PCR stock contains 100 mM Tris pH8.3, 500 mM KCl,and 0.1% gelatin), 200 μM (final concentration) each dNTP, 2.5 mM (finalconcentration) MgCl₂, and 12.5 units of AmpliTaq Gold (Perkin-ElmerCorp., Norwalk, Conn.), 1 μM (final concentration) forward primer, 1 μM(final concentration) reverse primer, 5 μM (final concentration) dUTPRhodamine (2.5 μl of Molecular Probes Catalog No. C-7629, Eugene,Oreg.), distilled water, and 1.25 μg of cDNA or DNA derived from thehemizygous cell. The amount of cDNA required will vary depending on theexpression levels in the hemizygous cell of the gene of interest beingamplified. The pH of the Taq buffer and the concentrations of MgCl₂ andKCl may differ from probe to probe depending on the PCR optimizationconditions.

[0077] The reaction mix is then divided into 5 PCR tubes, such that eachtube contains 100 μl. The DNA is amplified using a PCR machine (e.g., aPerkin Elmer 9600) using, for example, the following cycle parameters:95° C. for 12 minutes; then 30 cycles of: 94° C. for 20 seconds, optimalannealing temperature for 30 seconds, 72° C. for 45 seconds; followedby: 72° C. for 12 minutes; and then holding at 4° C. The optimalannealing temperature for each probe is determined by an initial PCRoptimization procedure and may differ between probes.

[0078] The probe reaction mixtures are combined into one tube (0.5 mltotal) and added to 1.5 ml of TE buffer (10 mM Tris pH 8.0, 1 mMNa₂EDTA). The 2 ml mixture is then added to a Centricon-50 cartridge(Amicon Inc., Beverly, Mass.) and centrifuged in a Sorvall SS-34 rotorat 5500 rpm for 15 minutes. The filters are washed by adding 2 ml of TEbuffer to the cartridge and centrifuging at 5500 rpm for 15 minutes, andrepeating for a total of three washes. The Centricon-50 filters are theninverted, and the labeled PCR product (i.e., the probe) recovered bycentrifuging at 2000 rpm for 2 minutes. (The flow-through volume at thebottom of the cartridge contains the recovered probe.) The OD₂₆₀ andOD₂₈₀ is read to determine the concentration of the probe recovered fromthe filter. Serial dilutions (1:2) of the probe are then made, and thefluorescence measured by electrophoresis on an automated DNA sequencer(e.g., an ABI 377 sequencer), capillary electrophoresis/laser inducedfluorescence instrument, or a similar device with detectors tuned to theemitting wavelength of the fluorophore. This information is then used tocalculate the specific activity of the probe (fluorescence incorporationper femtomole of PCR product), which can then be mixed with a sample inan amount sufficient for the detection of probe-sample heteroduplexes.

[0079] Method of Determining Whether a Gene is Present in a SingleParental Copy

[0080] In general, there are two preferred methods for determining if acell line contains only one allelic form of a gene, and is thereforesuitable for making monoallelic probes having a unique sequence whichare useful in the invention. The two methods are applicable in differentsituations. A cell that has only one copy of a gene or chromosome canbest be identified by cytogenetic or molecular analysis. A cell that hasmore than one copy of a single allelic form of a gene can best beidentified by molecular analysis.

[0081] I. Gene is Present in a Single Copy in a Cell

[0082] Cytogenetic analysis can reveal the number of copies in a cell ofa specific DNA sequence or of a collection of DNA sequences (forexample, a collection of DNA sequences spread over an entirechromosome). The methods for cytogenetic analysis are well know in theart (see, for example, Chapter 4 of Dracopoli, N. et al., CurrentProtocols in Human Genetics, John Wiley and Sons, 1997, and referencescited therein). Briefly, the steps for cytogenetic analysis are asfollows: (i) arrest dividing cells in interphase or metaphase withmitosis blocking drugs such as colchicine; (ii) make the cells permeableto DNA probes (for interphase studies), or make a chromosome spread bydropping cells on glass slides; (iii) hybridize labelled denatured DNAcorresponding to a DNA segment or segments (which could represent agene, chromosome, or other segment of interest); (iv) wash offunhybridized probe; and (v) visualize the probe-bound molecules usingmicroscopy. The DNA or chromatin is stained to allow visualization ofthe probe in the context of the chromosomes.

[0083] A variety of image enhancing methods can be used to aidvisualization of the bound probe. Probes are often labelled withfluorescent tags and used in a technique that is referred to asfluorescent in situ hybridization, or FISH (Speicher, M. R. et al.,Nature Genetics 12: 368-375, 1996). Since the labelled probe hybridizesto each cellular copy of the target DNA segment, the number of copies ofthe target gene can be determined by simply counting the number ofhybridizing spots. FISH has been used extensively to count the number ofcopies of specific sequences or chromosomes (Barks, J. H. et al., GenesChromosomes Cancer 19: 278-285, 1997; El-Naggar, A. K. et al., HumanPathology 28: 881-886, 1997). When one or more chromosomes from a cellline are shown to be present in one copy, that cell line can then beused to prepare monoallelic probes for any gene located on the singlecopy chromosome or chromosomes. Virtually all near-haploid tumor celllines described in the literature have been identified by karyotyping.

[0084] II. Gene is Present in Multiple Copies in a Cell

[0085] In order to be suitable for the production of monoallelic probes,it is not necessary that a cell contain a gene in a single copy; allthat is required is the presence of a single allelic form of the gene inthe cell or population of cells. Hence, some cells may have two or morecopies of the same gene, chromosome, or set of chromosomes. This canoccur by aberrant fertilization, as in the case of hydatidiform moles,where two copies of the same paternal chromosome set are often found.Alternatively, it can occur in cancers as a result of chromosomenondisjunction, or partial or complete chromosome deletion, followed byduplication (sometimes called endoreduplication) of the remainingchromosome or chromosomes. Other instances when this occurs aredescribed herein. In order to determine if such a cell contains only oneallelic form of a gene or genes or other DNA segment, it is best to usemolecular techniques.

[0086] Molecular analysis can reveal the presence of different copies ofa specific DNA sequence that is polymorphic (i.e., known to vary in apopulation). The absence of two polymorphic forms in a cell indicatesthat at the sampled site the cell contains only one allelic form, or ishomozygous at that site. The demonstration of homozygosity at many siteslocated on the same chromosome is evidence of homozygosity for a regionof a chromosome, or for an entire chromosome. For example, consider acell or tissue sample that has been genotyped at 25 known sites of DNAsequence variation, all mapped to the same chromosome and known to bedistributed along the length of the chromosome. If, in a normalpopulation, 50% of subjects are heterozygotes at each of the 25 sites,but in the genotyped sample all 25 sites are homozygous, then it isextremely likely that the sample has only one allelic form of every geneon the tested chromosome, regardless of the number of chromosomal copiesin each cell. Specifically, the likelihood of finding a cell homozygousfor all 25 markers, given that each is heterozygous in 50% ofindividuals, is 2 to the 25th power (225), or one in 33,554,432.

[0087] Many polymorphic DNA markers have been identified in man and maybe used to perform the genotyping analysis (see, for example, theGenethon human genetic linkage map, published as an appendix to Dib etal., Nature 380: 152-154, 1996). The ideal candidate polymorphic sitesare sites that can be assayed by polymerase chain reaction (PCR), andthat are highly polymorphic. Short tandem repeat polymorphisms (STRs)are di-, tri-, or tetra-nucleotide sequences (for example thedi-nucleotide CA) that repeat for a variable number of times indifferent alleles. They can be tested by PCR amplification of the STRusing flanking primers in unique sequence, followed by gelelectrophoresis to resolve alleles by size. PCR allows the use of smallamounts of sample DNA and the automation of genotyping. Otherpolymorphic DNA markers that may be used for genotyping includerestriction fragment length polymorphisms (RFLPs) and variable number oftandem repeat polymorphisms (VNTRs). Methods for genotyping DNA samplesusing these types of DNA markers are provided in Dracopoli, N. et al.(Current Protocols in Human Genetics, John Wiley and Sons, 1997; see, inparticular, Chapter 2).

[0088] Variance Detection by SSCP

[0089] One technique commonly employed in the identification of singlenucleotide differences is the single strand conformation polymorphism(SSCP) method (Orita et al., Genomics 5: 874-879, 1989). Thismethodology is effective for scanning PCR products for sequencevariants, and is a standard technique in human genetics for variancedetection, with numerous studies of its efficacy (greater than 90% withoptimized protocols) and schemes for improved throughput. The probes ofthe invention are useful in the SSCP method because they provide goodcontrols, since the probe will give rise to only one allelic form of thesequence being analyzed. The band or bands produced from a single alleleare often useful in interpreting more complex patterns of bands producedfrom heterozygous samples, because it is possible to determineunambiguously which molecular species are derived from each allele inthe heterozygous sample. Likewise, samples from the cells describedherein are also useful in the SSCP technique, since it is it helpful tohave samples with known alleles for probe production.

[0090] Variance Detection by DGGE

[0091] Denaturing gradient gel electrophoresis (DGGE) represents apreferred technique for the identification of DNA sequence variances ingenomic DNA or cDNA, or in PCR products amplified from genomic DNA orcDNA. The DGGE method was originally described by Fischer and Lerman(“Two Dimensional Electrophoretic Separation of Restriction EnzymeFragments of DNA,” Methods in Enzymology 68: 183-191, 1979; “DNAFragments Differing by Single Base-Pair Substitutions are Separated inDenaturing Gradient Gels: Correspondence with Melting Theory,” Proc.Natl. Acad. Sci. U.S.A. 80:1579, 1983) and has been improved since bymany investigators (see, for example, Myers et al., “Mutation Detectionby PCR, GC-Clamps, and Denaturing Gradient Gel Electrophoresis,” (pp.71-88) in Erlich, H. A. (ed.), PCR Technology: Principles andApplications for DNA Amplification, Stockton Press, New York, 1989;Myers et al., “Detecting Changes in DNA: Ribonuclease Cleavage andDenaturing Gradient Gel Electrophoresis,” (pp. 95-139) in Davies, K. E.(ed.): Genomic Analysis: A Practical Approach, IRL Press Ltd., Oxford,1988; and Abrams and Stanton Jr., Methods in Enzymology 212: 71-1041992).

[0092] The basic principal of DGGE involves the creation of a gradientof denaturant in a gel, which is then used to resolve double strandedDNA (or RNA) fragments on the basis of conformational differencesassociated with strand melting. The denaturant can be chemical (as inDGGE, where a gradient of formamide and urea is typically used) orthermal (as in a related technique called thermal gradient gelelectrophoresis, or TGGE, where a gradient of heat is used). To obtainconditions where double stranded DNA is close to melting, DGGE gels areimmersed in a heated bath of electrophoresis buffer, while TGGE gelshave a fixed concentration of chemical denaturant.

[0093] As a double stranded DNA molecule migrates through a DGGE gelfrom a low concentration of denaturant at the origin to higherconcentrations of denaturant toward the end of the gel, it eventuallyreaches a level of denaturant that will cause partial melting. Somedesign of DNA molecules (e.g., addition of a GC clamp) is oftennecessary to assure that the partial melting will occur as desired. Theconcentration of denaturant required to melt a given DNA segment ishighly sensitive to sequence differences in the DNA, including changesas subtle as a single nucleotide substitution. Partially melted DNAfragments move through gels at a much slower rate than their fullyduplex counterparts. Thus, two DNA fragments differing at a singlenucleotide can be distinguished on the basis of their gel position afteran appropriate period of electrophoresis; the fragment with the morestable structure (resulting from, for example, a G:C base pair in placeof an A:T pair) will travel further in the gel than its less stablecounterpart, because it will encounter the concentration of gradientrequired to melt it (and consequently dramatically retard or nearly stopits movement) at a point further along in the gel.

[0094] The DGGE method reveals the presence of sequence variationbetween individuals as shifts in electrophoretic mobility, but does notshow the sequence itself. Direct sequencing of DNA fragments (fromdifferent individuals) with altered mobility in the DGGE assay willreveal the precise sequence differences among them (see, e.g., Ausubelet al., supra for standard sequencing techniques). From the nucleic acidsequence data, the amino acid sequence can be determined, and any aminoacid differences can be identified.

[0095] The DGGE method is suitable for analysis of restriction enzymedigested genomic DNAs, as initially described by Lerman and co-workers(supra) and later extended (Gray, M., Amer. J. of Human Genetics 50:331-346, 1992). DGGE is equally suitable for analysis of cloned DNAfragments or DNA fragments produced by PCR. The analysis of clonedfragments or PCR fragments has the advantage that non-natural sequences,rich in G and C nucleotides, can easily be added to the 5′ ends (eitherflanking the cloning site or at the 5′ ends of PCR primers). Such DNAfragments have very stable double stranded segments, called GC clamps,at one or both ends. The GC clamps alter the melting properties of thefragments, and can be designed to insure melting of the inter-primersegment of the PCR product at a lower temperature than the clamps,thereby optimizing the detection of sequence differences (see Myers etal., supra and Myers et al., Nucleic Acids Research 13: 3131, 1985). GCclamps can be rationally designed for any specific DNA fragment of knownsequence by use of a computer program (e.g., the MELT94 program writtenby L. Lerman and used in Michikawa et al., Nucl. Acids Res,25(12):2455-2463, 1997) that accurately predicts melting behavior basedon analysis of primary sequence. When GC clamps are used correctly, theDGGE method is highly efficient at detecting DNA sequence differences.Not only are nearly 100% of differences detected, but the false positiverate is essentially zero (Abrams, E. S. et al., Genomics 7: 463-475,1990). Recently, methods for increasing the throughput of DGGE have beendeveloped, based on multiplex PCR.

[0096] In general, the steps for carrying out DGGE with GC clamps are asfollows.

[0097] 1. Design DNA fragments with optimal melting behavior.Oligonucleotide primers are selected, using GC clamps as necessary, toproduce a single melting domain over the length of the sequence to beanalyzed. It may be necessary to divide the sequence into overlappingfragments to achieve this goal. Design of primers and simulated analysisof fragments can be performed with the computer program described byLerman (Lerman and Silverstein, Methods in Enzymology 155: 482-501,1987). The output of the program is the melting map of the fragment,from which it will also be possible to determine the optimal range ofdenaturant in the gradient and the approximate electrophoresis time forfragments to reach the point of melting in the gradient.

[0098] 2. Amplify test fragments by PCR. Procedures for optimizing PCRare briefly described in the specification and are well known in theart. Template DNA samples may either be cDNA or genomic DNA and aretypically drawn from a panel of unrelated individuals.

[0099] 3. Prepare a probe fragment and form heteroduplexes between probeand sample. The use of probes has the advantage that all the testsamples are compared to a single reference sample (often of knownsequence) simplifying analysis of results; the probe also serves as astandard or marker for the mobility of a specific allele (since mobilityof fragments in DGGE is not proportional to length it can be difficultto identify fragments with certainty). If samples are labelled, use of aprobe allows one labelling procedure to be used for a large number ofsamples. PCR is the best way to prepare probes because a GC clamp can beattached to the sequence to be analyzed using a long primer. TemplateDNA samples can either be cDNA or genomic DNA. Optimally, the probe willbe produced from a hemizygous or cloned template, insuring that theprobe represents a single unique sequence.

[0100] 4. Pour a denaturing gradient gel. Briefly, two gel solutions aremade containing the desired beginning and end concentrations ofdenaturant. The gel solutions are generally made by mixing “0%” and“100%” denaturant stock solutions, where the 0% stock consists of 7%acrylamide in Tris-acetate EDTA (TAE) electrophoresis buffer, and the100% stock is also 7% acrylamide in TAE, plus 40% formamide by volumeand seven molar urea. Equal volumes of the two solutions (e.g., twelvemilliliters of each solution) are poured into the two chambers of agradient maker (usually between 20 and 40% denaturant in the upstreamchamber and 60 to 80% in the lower one) immediately after addition ofammonium persulfate and TEMED for acrylamide polymerization. Thegradient gel can then be poured by opening the stopcock of the gradientmaker. Usually gels are 0.75 to 1 mm in thickness, and gel combs thatform 10-30 wells are used. With commercially available apparatusmultiple gradient gels can be poured simultaneously. Suitable apparatusis sold by several vendors, including the BioRad (Hercules, Calif.)Decode system and the C.B.S. Scientific DGGE system.

[0101] 5. Place the gel in a heated bath of electrophoresis buffer. Gelsare electrophoresed at elevated temperature which, together with thedenaturant, brings the DNA fragments to their melting point. Gels areoften run at 60° C. in 1×TAE buffer, with constant recirculation ofbuffer to the upper buffer chamber. Once the gel has been placed in theheated tank and allowed to equilibrate it can be loaded. Multiple gelscan be run simultaneously in the same tank with the apparatus listedabove.

[0102] 6. Load and run gel. Enough PCR product from each sample may beloaded on the gel so that samples can be detected by a simple DNAstaining procedure, thereby avoiding use of radioactivity, dyes orhybridization procedures. To achieve this, at least 100 ng of eachsample should be loaded, but preferably over 200 ng. Gel runningconditions can be estimated from the output of the MELT87 program (usedin Desbois et al., Hum. Mutat. 2(5): 395-403, 1993), however empiricaladjustment are often necessary. Usually a voltage of approximately 80 Vto 200 V is applied for periods of 5-20 hours, depending on thecharacteristics of the fragments being analyzed.

[0103] 7. Stain and analyze gel. After electrophoresis, gels are stainedwith, for example, ethidium bromide, SYBR Green, or silver. The locationof PCR products produced with the same primer pairs is compared. Alteredlocation, and usually the appearance of two or more bands instead ofone, signifies the presence of DNA sequence differences. More than twobands from a diploid sample are often present because, during theterminal cycle of heating and cooling of the PCR step, heteroduplexesare formed between the maternally and paternally inherited alleles. Ifthose alleles differ in sequence, the heteroduplexes will have mispairednucleotides at the sites of difference. As a result, the heteroduplexeswill be less stable than either of the homoduplex species, and willconsequently melt and be retarded in the gel at a lower concentration ofdenaturant. Altogether, one may see four bands in such samples: tworeciprocal heteroduplexes and two homoduplexes. The specific pattern offragments in each lane constitutes a signature for a specific nucleotidechange.

[0104] 8. Sequence DNA fragments with altered mobility. Examples of alldifferent signatures are next analyzed by DNA sequencing to identify thebase difference(s) accounting for altered mobility in the gradient gel.Sequencing is performed according to standard techniques.

[0105] Variance Detection Using a T4 Endonuclease VII Mismatch CleavageMethod

[0106] The enzyme, T4 endonuclease VII, is derived from thebacteriophage T4. T4 endonuclease VII is used by the bacteriophage tocleave branched DNA intermediates which form during replication so thatthe DNA can be processed and packaged. T4 endonuclease can alsorecognize and cleave heteroduplex DNA containing single base mismatchesas well as deletions and insertions. This activity of the T4endonuclease VII enzyme can be exploited to detect sequence variancespresent in the general population.

[0107] In one particular approach to the identification of sequencevariations using a T4 endonuclease VII mismatch cleavage assay, 400-600bp regions from a candidate gene are amplified in a panel of DNA samples(for example, cDNA or genomic DNAs representing some cross section ofthe world population), for example, by the polymerase chain reaction,and are mixed with a labeled probe. Heating and cooling the mixturesallows heteroduplex formation between the probe and sample DNA. To thismixture is added T4 endonuclease VII, which will recognize and cleave atsequence variance mismatches formed in the heteroduplex DNA.Electrophoresis of the cleaved fragments, for example, on an automatedDNA sequencer may be used to determine the site of cleavage. To morespecifically pinpoint the variance site, a subset of the PCR fragmentsidentified by T4 endonuclease VII cleavage as containing variances maybe sequenced in the region of cleavage to establish the specificlocation and nature of the base variation.

[0108] For carrying out the above detection method, a candidate genesequence may be downloaded from an appropriate database, and primers forPCR amplification may be designed which result in the target sequencebeing divided into amplification products of between 400 and 600 bp.Preferably, there will be a minimum of a 50 bp overlap (not includingthe primer sequences) between the 5′ and 3′ ends of adjacent fragments,to ensure the detection of variances which are located close to one ofthe primers.

[0109] Optimal PCR conditions for each of the primer pairs aredetermined experimentally. Parameters including, but not limited to,annealing temperature, pH, MgCl₂ concentration, and KCl concentrationmay be varied until conditions for optimal PCR amplification areestablished. The PCR conditions derived for each primer pair are thenused to amplify a panel of DNA samples (cDNA or genomic DNA) which ischosen to best represent the various ethnic backgrounds of the worldpopulation or some designated subset of that population, for example, apopulation with a specific disease or therapeutic response.

[0110] For variance detection, one DNA source is chosen to be used as aprobe. Optimally this DNA source should be one of the sources of uniquesequence DNA described above. The same PCR conditions used to amplifythe panel are used to amplify the probe DNA. However, a labelednucleotide (such as a fluorescently labeled nucleotide) is included inthe deoxynucleotide mix, such that a percentage of the incorporatednucleotides will be fluorescently labeled. The labeled probe is mixedwith the corresponding PCR products from each of the DNA samples, andthen heated and cooled rapidly to allow the formation of heteroduplexesbetween the probe and the PCR fragments from each of the DNA samples. T4endonuclease VII is added directly to these reactions and allowed toincubate for 30 minutes at 37° C. 10 μl of a formamide loading buffer isthen added directly to each of the samples, which are then denatured byheating and cooling.

[0111] A portion of each sample is electrophoresed, for example, on anABI 377 sequencer or by capillary electrophoresis. If there is asequence variance between the probe DNA and the sample DNA, a mismatchwill be present in the heteroduplex fragment formed. The enzyme T4endonuclease VII will recognize the mismatch and cleave at the site ofthe mismatch. This will result in the appearance of two peakscorresponding to the two cleavage products when run on the ABI 377sequencer. If there are two differences between probe and sample, thenthere will be two cleavages, resulting in three fragments, and so on.

[0112] Fragments identified as containing variances are subsequentlysequenced using conventional methods to establish the exact location andnature of the variance.

[0113] Other methods for carrying out T4 endonuclease VII assays aredescribed for example, in Cotton et al., U.S. Pat. No. 5,698,400 andBabon et al., U.S. Ser. No. 08/545,404.

[0114] Use of Hemizygous Probes for Haplotyping

[0115] In any diploid cell, there are two haplotypes at any gene orother chromosomal segment that contain at least one distinguishingvariance. In many well-studied genetic systems, haplotypes are morepowerfully correlated with phenotypes than single nucleotide variances.Thus, the determination of haplotypes is valuable for understanding thegenetic basis of a variety of phenotypes including diseasepredisposition or susceptibility, response to therapeutic interventions,and other phenotypes of interest in medicine, animal husbandry, andagriculture.

[0116] In samples of DNA or cDNA derived from tissues or cells that havetwo chromosomes (i.e., all normal somatic tissues in humans and animals)in which there are two or more heterozygous sites, it is generallyimpossible to tell which nucleotides belong together on one chromosomewhen using genotyping methods such as (i) DNA sequencing, (ii) nucleicacid hybridization of oligonucleotides to genomic DNA or total cDNA oramplification products derived therefrom, (iii) nucleic acidhybridization using probes derived from genomic DNA or total cDNA oramplification products derived therefrom, or (iv) mostamplification-based schemes for variance detection.

[0117] Haplotypes can be inferred from genotypes of related individualsby using a pedigree to sort out the transmission of groups ofneighboring variances, but pedigree analysis is of little or no use whenunrelated individuals are the subject of investigation, as is frequentlythe case in medical studies. There are some methods for determininghaplotypes in unrelated individuals, for example, methods based onsetting up allele-specific PCR primers for each of two variances thatare being scanned (Michalatos-Beloin et al., Nucl. Acids Res. 24:4841-4843, 1996); however, these methods generally require customizationfor each locus to be haplotyped, and can therefore be time-consuming andexpensive. Such differential priming methods do not rely on the use ofprobes.

[0118] The production of hemizygous probes from hemizygous cells asdescribed herein is useful for the determination of haplotypes, using aprocedure based on formation and detection of heteroduplexes betweenprobe and sample strands. As shown in the example schematicallydiagramed in FIG. 1, two variances 300 nucleotides (nts) apart can beamplified together on a 600 nucleotide (nt) PCR product. There are fourpossible alleles (numbered 1-4 to the left of FIG. 1), comprising allthe possible pair-wise combinations of variances. For example, forindividuals heterozygous at both variance #1 and variance #2, there areonly two types of heterozygotes in whom this will occur: heterozygotesfor alleles 1+4 or for alleles 2+3. The other four possible combinationsof different alleles, 1+2, 1+3, 2+4 and 3+4, are only heterozygous atone position. Thus, there is no problem determining haplotypes; agenotyping procedure is sufficient. (Note that if each of the fouralleles is duplicated, there are also four classes of homozygotes.)

[0119] Hemizygous probes derived from each of the four alleles may beused to type the 10 possible classes of allele pairs. Table II belowshows the cleavage fragments that would be generated by a resolvase,such as T4 endonuclease 7, or by chemical cleavage, after formation ofheteroduplexes between samples, with the genotypes (allele pairs) shownacross the top and the labelled probes shown (by allele number) in theleft column. TABLE II Cleavage Products of FIG. 1 Genotype of the sampleProbe Double heterozygotes Single heterozygotes Homozygotes allele 1 + 42 + 3 1 + 2 1 + 3 2 + 4 3 + 4 1 + 1 2 + 2 3 + 3 4 + 4 1 500 500 500 2 ×500 400 400 400 2 × 400 300 300 300 2 × 300 200 200 200 2 × 200 200 2 ×200 2 × 200 100 100 100 100 2 × 100 2 × 100 2 × 100 2 500 500 500 2 ×500 400 400 400 2 × 400 300 300 300 2 × 300 200 200 200 2 × 200 200 2 ×200 2 × 200 100 100 100 100 2 × 100 2 × 100 2 × 100 3 500 500 500 2 ×500 400 400 400 2 × 400 300 300 300 2 × 300 200 200 200 2 × 200 200 2 ×200 2 × 200 100 100 2 × 100 100 100 2 × 100 2 × 100 4 500 500 500 2 ×500 400 400 400 2 × 400 300 300 300 2 × 300 200 200 200 2 × 200 200 2 ×200 2 × 200 100 100 2 × 100 100 100 2 × 100 2 × 100

[0120] Table II shows a model that assumes, for ease of presentation,complete cutting at sites of base mispairing. Use of this assay,however, does not rely upon complete cutting since there is apredictable relationship between the presence of mispaired bases(cleavage sites) and the ratio of products produced that will generallyallow data such as that shown in Table II to be inferred from actualdata.

[0121] For example, in the combination of probe #1 with heterozygoussample 1+4, Table II shows cleavage products of 300 nt, 200 nt, and 100nt. There might also be incompletely cleaved products of 300+200=500 and300+100=400. However, comparison of the intensity of the 400 nt cleavageproducts between probe #1/sample 1+4 and probe #1/sample 1+2, or the 500nt cleavage products between probe #1/sample 1+4 and probe #1/sample1+3, would reveal a relative loss of intensity of those [400 nt and 500nt] products in the probe #1/sample 1+4 heteroduplex cleavage productdue to cleavage at an intervening site. (Note that Table II does notshow uncleaved products.)

[0122] Table II indicates, first, that if heteroduplexes can be detectedquantitatively (i.e., if two copies of a 100 nt product can bedistinguished from one copy), all 10 genotypes can be distinguished by asingle hemizygous probe. The possibility exists, however, in manyheteroduplex-based variance detection procedures using, for example, T4endonuclease VII, of some variation in cleavage efficiencies which mayaffect quantitative analysis of the data.

[0123] The second conclusion that can be drawn from Table II is that,even without precise quantitation, the two haplotypes in the doubleheterozygote samples (1+4 and 2+3 on Table II) can be distinguishedqualitatively. Hence, data generated using this assay can be analyzedboth quantitatively as well as qualitatively. For example, if agenotyping procedure were done on all samples first to determine whichindividuals were double heterozygotes (and therefore needed to behaplotyped), then a subsequent heteroduplex-based haplotype assay usinga hemizygous probe would give unambiguous haplotype results. This twostep approach (genotyping first, then haplotyping) is a practical,albeit time-consuming, solution to haplotyping.

[0124] The third conclusion that can be drawn from Table II is that thecleavage product pattern from homozygotes mimics the cleavage productpattern observed in some of the heterozygote samples (for example, aprobe produced from allele #1 gives the same size products with bothheterozygous 1+4 and homozygous 4+4 in Table II). Such a result may leadto confusion in determining the haplotype of a sample.

[0125] To effectively deal with this situation, the present approach maybe carried out with more than one hemizygous probe used in a serialfashion. The aggregate data from multiple probe-sample heteroduplexesgreatly facilitates the identification of haplotypes by providing bothcomplementary and redundant data. It is straightforward to determine allhaplotypes (even in the absence of genotype data) from, for example,resolvase cleavage patterns. Table II illustrates the complementarypatterns of cleavage expected with the use of different hemizygousprobes (probes #1-#4). A clear example of the utility of using multipleprobes is the instance in which a series of probes have progressivelymore mispaired bases with the test alleles. The progressive appearanceof smaller products (adding up to the size of larger products detectedwith other probes with fewer mispairs) is an indication that the sameallele is being progressively cut, and therefore that all the variancesfrom the hemizygous probe lie on that allele. For example, the presenceof a series of heteroduplex cleavage patterns with different probesshowing weakening or disappearance of a 350 nt product with appearanceof 200 nt+150 nt products, and weakening or disappearance of the 200 ntproduct with appearance of 120 nt+80 nt products would indicate themispaired bases responsible for the cleavage of the 350 nt and 200 ntfragments lie on the same allele.

[0126] For the purposes of determining haplotypes, two or morehemizygous probes are preferably used in separate experiments on a setof samples to produce and analyze heteroduplexes. In addition, anynumber of probes, for example, three, or, more preferably, four or moreprobes, may be utilized for heteroduplex analysis. Note that, althoughthere are only four possible alleles in the case illustrated in FIG. 1and Table II, a fragment with three variances would have 2³, or 8possible haplotypes, and a fragment with four variances would have 2⁴,or 16 possible haplotypes, and so on. All possible haplotypes, however,are rarely observed. Hence, the ability to generate numerous hemizygousprobes from multiple hemizygous cells using the methods described hereinrepresents an ideal way to produce the multiple probes necessary todetermine genotypes from analysis of heteroduplex products.

[0127] All publications and patent applications mentioned in thisspecification are herein incorporated by reference to the same extent asif each independent publication or patent application was specificallyand individually indicated to be incorporated by reference.

[0128] Other embodiments are within the claims.

1. A method for detecting a nucleotide mismatch in a nucleic acidsample, said method comprising the steps of: (a) providing a nucleicacid probe derived from a hemizygous cell, said probe beingcomplementary to a hemizygous chromosome or segment thereof present insaid hemizygous cell; (b) forming a duplex between said nucleic acidsample and said probe; and (c) determining if said duplex contains anucleotide mismatch.
 2. The method of claim 1, wherein said determiningstep is carried out using a denaturing gradient gel electrophoresistechnique.
 3. The method of claim 1, wherein said nucleotide mismatchrepresents a sequence variance in a population.
 4. The method of claim1, wherein said probe has a known sequence.
 5. The method of claim 1,wherein said probe is detectably labeled.
 6. The method of claim 1,wherein said hemizygous cell results from the loss of a chromosome orsegment thereof.
 7. The method of claim 1, wherein said hemizygous cellcomprises multiple copies of said hemizygous chromosome or segmentthereof.
 8. The method of claim 1, wherein said hemizygous cell ishuman.
 9. The method of claim 1, wherein said hemizygous cell is animmortalized cell.
 10. The method of claim 1, wherein said hemizygouscell is derived from a complete hydatidiform mole, an ovarian teratoma,an acute lymphocytic leukemia, an acute myeloid leukemia, a solid tumor,a squamous cell lung cancer, an endometrial ovarian cancer, a malignantfibrous histiocytoma, or a renal oncocytoma.
 11. The method of claim 1,wherein said hemizygous cell is NALM-16 or KBM-7.
 12. The method ofclaim 1, wherein said hemizygous cell is derived from a haploid germcell.
 13. The method of claim 1, wherein the presence of said nucleotidemismatch correlates with a level of therapeutic responsiveness to a drugor other therapeutic intervention.
 14. The method of claim 1, whereinthe presence of said nucleotide mismatch indicates a disease orcondition, or a predisposition to develop said disease or condition. 15.The method of claim 1, wherein said nucleic acid probe is produced byamplifying at least a portion of said hemizygous chromosome or segmentthereof to produce said probe.
 16. The method of claim 1, wherein saiddetermining step utilizes a protein that binds or cleaves saidnucleotide mismatch.
 17. The method of claim 16, wherein said protein isMutS.
 18. The method of claim 16, wherein said protein is a resolvase.19. The method of claim 18, wherein said resolvase is T4 endonucleaseVII.
 20. The method of claim 1, wherein said determining step utilizes achemical agent that detects said nucleotide mismatch.
 21. The method ofclaim 1, wherein said method is used to determine the haplotype of saidnucleic acid sample.
 22. A method for detecting a nucleotide mismatch ina nucleic acid sample, said method comprising the steps of: (a)providing a nucleic acid probe derived from a sex chromosome; (b)forming a duplex between said nucleic acid sample and said probe; and(c) determining if said duplex contains a nucleotide mismatch.
 23. Amethod for detecting a nucleotide mismatch in a nucleic acid sample,said method comprising the steps of: (a) providing a nucleic acid probederived from a somatic cell hybrid, said probe being complementary to achromosome or segment thereof, wherein only one allele of saidchromosome or segment thereof is present in said somatic cell hybrid;(b) forming a duplex between said nucleic acid sample and said probe;and (c) determining if said duplex contains a nucleotide mismatch.
 24. Akit for detecting a nucleotide mismatch, said kit comprising: (a) anucleic acid probe derived from a hemizygous cell, said probe beingcomplementary to a hemizygous chromosome or segment thereof; and (b) ameans for detecting a nucleotide mismatch.
 25. The kit of claim 24,wherein said detecting means is a protein that binds or cleaves saidnucleotide mismatch.
 26. The kit of claim 25, wherein said protein isMutS.
 27. The kit of claim 25, wherein said protein is a resolvase. 28.The kit of claim 27, wherein said resolvase is T4 endonuclease VII. 29.The kit of claim 24, wherein said detecting means is a chemical agentthat detects said nucleotide mismatch.
 30. The kit of claim 24, whereinsaid probe is detectably labeled.
 31. A method for producing a nucleicacid probe for the detection of a nucleotide mismatch, said methodcomprising the steps of: (a) providing a hemizygous cell having at leastone hemizygous chromosome or segment thereof; and (b) amplifying atleast a portion of said hemizygous chromosome or segment thereof toproduce said probe.
 32. A method for producing a nucleic acid probe forthe detection of a nucleotide mismatch, said method comprising the stepsof: (a) providing nucleic acid from a hemizygous cell having at leastone hemizygous chromosome or segment thereof; and (b) using said nucleicacid to produce a probe, said probe being complementary to at least aportion of said hemizygous chromosome or segment thereof.
 33. The methodof claim 32, wherein said nucleic acid is amplified, said amplifiednucleic acid being a representation of the genomic DNA of saidhemizygous cell.
 34. The method of claim 32, wherein said nucleic acidis an RNA or DNA library.
 35. The method of claim 31 or 32, wherein saidprobe has a known sequence.
 36. The method of claim 31 or 32, whereinsaid method further comprises detectably labeling said probe.
 37. Themethod of claim 31 or 32, wherein said hemizygous cell is human.
 38. Themethod of claim 31 or 32, wherein said hemizygous cell is animmortalized cell.
 39. The method of claim 31 or 32, wherein saidhemizygous cell is derived from a complete hydatidiform mole, an ovarianteratoma, an acute lymphocytic leukemia, an acute myeloid leukemia, asolid tumor, a squamous cell lung cancer, an endometrial ovarian cancer,a malignant fibrous histiocytoma, or a renal oncocytoma.
 40. The methodof claim 31 or 32, wherein said hemizygous cell is NALM-16 or KBM-7. 41.The method of claim 31 or 32, wherein said hemizygous cell is derivedfrom a haploid germ cell.
 42. A nucleic acid probe for the detection ofa nucleotide mismatch, said probe being derived from a hemizygous celland being complementary to a hemizygous chromosome or segment thereof.43. The probe of claim 42, said probe being detectably labeled.
 44. Anucleic acid probe derived from an autosomal chromosome of a mammaliancell, said probe having a unique sequence.
 45. The probe of claim 44,said probe being detectably labeled.